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There exist ample demonstrations that indicators of 
scholarly impact analogous to the citation-based ISI 
Impact Factor can be derived from usage data. How- 
ever, contrary to the ISI IF which is based on citation 
data generated by the global community of scholarly 
authors, so far usage can only be practically recorded 
at a local level leading to community-specific assess- 
ments of scholarly impact that are difficult to gener- 
alize to the global scholarly community. We define a 
journal Usage Impact Factor which mimics the defi- 
nition of the Thomson Scientific's ISI Impact Factor. 
Usage Impact Factor rankings are calculated on the 
basis of a large-scale usage data set recorded for the 
California State University system from 2003 to 2005. 
The resulting journal rankings are then compared to 
Thomson Scientific's ISI Impact Factor which is used 
as a baseline indicator of general impact. Our results 
indicate that impact as derived from California State 
University usage reflects the particular scientific and 
demographic characteristics of its communities. 

1 Introduction 

Usage of scholarly resources as recorded by digital 
information systems has been gaining acceptance as 
a tool to study the scholarly community. Usage data 
has been used to study trends in science (Bollen, Luce, 
Vemulapalli, & Xu, 2003) as well as to visually map the 
interests of certain subsets of the scholarly community 
(Bollen & Van de Sompel, 2006a). In addition, usage 
data has been shown to be a promising alternative to 
citation data in the assessment of scholarly impact. As 
early as 2001 (Darmoni, Roussel, Benichou, Thirion, & 
Pinhas, 2002) propose a reading factor to rank journals 



according to their impact derived from a library's access 
statistics. Bollen and Luce (2002) and Bollen, Van de 
Sompel, Smith, and Luce (2005) propose the use of 
social network metrics calculated for journal networks 
derived from usage sequences in a library's access log. 
Kurtz et al. (2004b, 2004a) discuss the potential of usage 
data for impact ranking. Brody, Harnad, and Carr (2006) 
later explore how early article usage statistics can predict 
citation rates. In addition to these research developments, 
practical standards for publisher reported usage statistics 
(COUNTER project 1 ) and their aggregation (SUSHI 
project 2 ) have been developed. Thomson Scientific 
has recently included usage statistics in its ISI Web of 
Knowledge product 3 . 

Since usage data is recorded by particular information 
systems, the acquired data naturally pertains to the user 
community of those systems. For example, when Bollen 
and Luce (2002) rank journals according to their usage 
this is done on the basis of usage data recorded by the Los 
Alamos National Laboratory Research Library servers 
and therefore reflects the preferences of the LANL 
community. In a similar manner, the results reported 
by Brody et al. (2006) apply to the user community 
of the UK arXiv mirror 4 . A similar argument can be 
made for the "citation-download correlation tool" of the 
University of Southampton's CiteBase system 5 which 
uses download information from the UK arXiv mirror. 

In all cases the community for which usage was 
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recorded is delimited by the boundaries of a particular 
service or information system. The resulting sample of 
the scholarly community that generated the usage data 
through its interaction with these systems is unknown 
both in terms of its diversity and span. The CiteBase user 
community could in fact be a diverse mix of undergrad- 
uate students, professors, university staff, laypersons, 
and scholars. Its span may or may not be limited to 
the United Kingdom. The resulting usage data and its 
subsequent analysis could therefore be shaped by a set 
of sample characteristics that are not well-understood. In 
fact, when considering usage statistics as a population 
statistic, the question then emerges for which sample of 
the scholarly community usage has been recorded, and 
how the characteristics of that particular sample will 
influence the outcomes of a subsequent assessment of 
scholarly impact based on these statistics. 

The issue of sampling permeates the field of schol- 
arly impact assessments, even where citation data is used. 
Thomson Scientific's ISI Impact Factor (ISI IF) is calcu- 
lated from citation rates recorded for a set of ISI-selected 
journals. The corresponding sample of the scholarly com- 
munity consequently has the following characteristics: 

1. Span: extends to the global set of scholarly authors. 

2. Diversity: limited to scholarly authors, and articles 
published in the set of ISI-selected journals. 

In spite of the latter limitation, the ISI IF is perceived 
to be based on a representative and respected sample 
which supports its general acceptance as an indicator of 
scholarly impact. 

In comparison to the ISI IF, usage-based assessments 
of scholarly impact are generally based on samples of the 
scholarly community with the following characteristics: 

1. Span: delimited by the local boundaries of a partic- 
ular information service. 

2. Diversity: extends to all user types who can request 
services for any type of scholarly communication 
unit. 

In order to realize impact measures derived from usage 
data that could achieve the same level of acceptance as 
the ISI IF, explorations along both the above dimensions 
need to take place. The first dimension, i.e. span, entails 
the aggregation of usage data across a wide range of 
services to create a more global, representative sample 



of the scholarly community, i.e. increase its span. In 
fact, Bollen and Van de Sompel (2006b) propose an 
architecture for the large-scale aggregation of usage data 
which could be employed to achieve such global samples. 
This architecture however only addresses the technical 
issues involved in aggregating such samples; it does not 
address the issue of what constitutes a representative 
global sample, nor which services usage should be 
aggregated for. The second dimension, i.e. diversity, 
entails efforts to better understand and control how 
community characteristics, i.e. sample diversity, affect 
usage-based impact assessments, regardless of whether 
the sampled community is representative of the global 
scholarly community. 

Whereas (Bollen & Van de Sompel, 2006b) is focused 
on aspects of the first dimension, i.e. sample span, this 
article addresses the second dimension, i.e. sample di- 
versity: studying the effects of sample characteristics on 
usage-based assessments of impact. Usage of scholarly 
resources for all 23 California State University (CSU) 
campuses, comprising about 405,000 students and 44,000 
faculty and staff, was recorded throughout the entire Oc- 
tober 2003 to August 2005 period by the CSU linking 
servers (Van de Sompel & Beit-Arie, 2001), thereby gen- 
erating an extensive, high-granularity usage data set cov- 
ering one of the world's largest and most diverse schol- 
arly communities. A simple Usage Impact Factor (UIF) 
was defined to mimic the definition of the ISI IF and was 
then used to determine journal rankings on the basis of 
the recorded CSU usage data. Correlations between the 
resulting CSU UIF and ISI IF rankings are determined for 
a set of scholarly disciplines, demarcated by ISI journal 
classification codes. These correlations are then matched 
to the demographic features of the CSU community to 
yield insights into how they affect usage-based assess- 
ment of impact. 

2 Background 

2.1 Citation Impact Factor 

The IF of a particular journal in a particular year as 
defined by Garfield (1979) is determined by counting 
the number of citations that occur in a given year to 
articles published in the journal during the two previous 
years and dividing that number by the total number of 
published items in that two year period. As such, the IF 
corresponds to the probability that the articles published 
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in a particular journal over a 2 year period are effectively 
cited in a given year. 

More formally, the IF can be defined as follows. We 
denote the set A of (citable) articles published in journal 
j in year y as A y so that A 1 - = {a\, 02, • • • , a n }, where 
Gtj G A y represents an article published in journal j in 
year y. We introduce the citation function C v that maps 
a set of citable articles to the number of times these 
articles were cited by articles published in year y, i.e. 
C y {A) -> N. It follows that C y ) returns the number 
of citations recorded in year y that point to the set of 
articles published in journal j in year k. 

The IF of a journal j in year y, denoted lF y , is defined 
as the ratio of two quantities: 



R y (A y ~ U A y ~ ) represents the number of uses 
recorded in year y of articles published in journal j in the 
two proceeding years y — 1 and y — 2 



and 



|^4|~ U A y ~ j represents the number of articles 
published by journal j in the two proceeding years y — 1 
and y — 2. 



The UIF expresses the probability that an article 
published in a journal within a 2 year period is used in a 
particular year, much like the IF expresses the probability 
that an article published in a journal within a 2 year 
period is cited in a particular year. The similarities 
between the IF and the UIF are clarified in Fig. ^ 



where 



IF* 



C y (A y ~ 



UAf 



-'UAI 



(1) 



C y (A y ~ U A y ~ ) represents the number of citations 
in year y to all citable articles published in journal j in 
the two proceeding years y — 1 and y — 2, 



To ensure that the IF and UIF for a particular journal 
are determined on the basis of similar samples, the UIF 
denominator | A v ~ x U A y ~ 2 | is chosen to be that of the IF, 
namely the number of citable items published by journal 
j in years y — 1 and y — 2. In other words, the number of 
citable or "usable" articles in a journal are considered the 
same quantity for a particular year. 



and 



|j4* _1 U A y ~ 2 j represents the number of citable articles 
published by journal j in the two proceeding years y — 1 
and y — 2. 



2.2 Usage Impact Factor 

A similar reasoning can be applied to the definition of a 
Usage Impact Factor (UIF) which can be framed in terms 
of the probability that an article published in a particu- 
lar journal over a 2 year period is used, rather than cited. 
Analogous to the IF, we define the Usage Impact Factor 
of journal j in year y, denoted UIF|, as follows. We re- 
place the citation function C v (Aj) with the usage func- 
tion R v (A k - ) — > N which returns the number of times the 
articles in Aj are used in year y. The UIF can then be 
defined as the ratio between two quantities: 



articles in journal j 



articles in journal j 



UIF* 



R y (A y r 



UA y ~ 



UA*f 



(2) 



where 



all articles 




all articles 



# citations 
I 

2004 IFj ■< — # publications 




T 

2004 UIFj <- # publications 



Figure 1 : Usage Impact Factor (UIF) defined in analogy 
to the ISI Impact Factor (ISI IF). 



In this work, we use the full-text downloads of an arti- 
cle as an approximation of article usage. A similar prob- 
lem of approximation exists in citation analysis where 
author motivations to cite a particular article can vary 
strongly (MacRoberts & Mac Roberts, 1989) and a cita- 
tion can express any modality of agreement or interest. 
Contrary to citation data which lacks any formal indica- 
tion of author motivation, usage logs typically do specify 
the user request type thereby allowing a careful selection 
of which to consider for a particular analysis. Although 
yet finer distinctions can be made between different types 
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of usage, e.g. surveys to determine actual reading rates 
(King, Tenopir, & Clarke, 2006), such an investigation 
was beyond the scope of this study; full-text downloads 
were considered to be the most reliable, if somewhat par- 
tial, indicator of usage. 

2.3 Data Acquisition 
2.3.1 Sample Considerations 

The significance of sample span and diversity was out- 
lined in the introduction. Therefore, when discussing 
usage- or citation-based metrics of impact, two orthog- 
onal factors need to be taken into account : 

1. The characteristics of the sample that the specific 
metric has been calculated for, i.e. sample span and 
diversity, 

2. The formal definition of a metric as an indicator of 
scholarly impact. 

This perspective is represented in Fig. [2] The IF, as 
defined in Eq. [2 can be calculated for any set of journal 
citation data. However, the most common instantiation 
of the IF is the one published by Thomson Scientific's 
ISI. This ISI IF is calculated on the basis of citation data 
for a core set of about 8000 ISI-selected journals. With 
regards to the span of its sample, the ISI IF places no 
restrictions on the origin or affiliation of authors and 
therefore represents a global sample of the scholarly 
community, albeit one whose diversity is limited by the 
focus on authors who published journal articles in the set 
of ISI-selected journals. 

The IF can be calculated for local citation samples. 
For example, McDonald (2006) extracts citation data 
pertaining only to California Institute of Technology 
authors to determine local citation impact. This approach 
results in a Local Impact Factor (LIF) as indicated in Fig. 

m 

The UIF as defined in Eq. |2] can in principle be 
calculated for any usage data set, but the nature of usage 
data is such that it is generally recorded for the local user 
communities of a specific service. This paper reports on 
UIF values calculated on the basis of usage data set for the 
California State University system which corresponds to 
a local, CSU-specific sample of the scholarly community. 
We therefore label the consequent UIF values "CSU UIF" 
to indicate the fact that they apply to local CSU usage. 
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Figure 2: Two orthogonal factors: formal metric defini- 
tion and the sample to which it has been applied. 



The aggregation of usage data sets across different ser- 
vices and institutions may in the future yield increasingly 
global samples of the scholarly community. The resulting 
UIF rankings would then reflect a more global rather than 
a local, institutional sample of the scholarly community. 
Such metrics are labeled Global Usage Impact Factor 
(GUIF) in Fig. 

This paper outlines a comparison of the globally ori- 
ented ISI IF which is used as a baseline indicator of gen- 
eral impact versus the CSU UIF which represents a local, 
CSU-specific facet of scholarly impact. It is however con- 
ceivable that once aggregated usage data becomes avail- 
able a comparison between CSU UIF and the GUIF, the 
latter used as a global baseline, could be equally informa- 
tive. 

2.3.2 ISI IF Citation Data 

ISI IF values were extracted from the 2004 Journal Ci- 
tation Reports (JCR) that are published on a yearly basis 
by Thomson Scientific's ISI. Combined, the Science and 
Social Science edition of the 2004 JCR contained impact 
factors for 7,356 scholarly journals. 

2.3.3 CSU UIF Usage Data 

A large-scale usage log was created by aggregating usage 
data recorded by the linking servers (Van de Sompel, 
1999a, 1999b; Van de Sompel & Beit-Arie, 2001) of the 
entire California State University system in the period 
October 2003 to August 2005. Recording started Novem- 
ber 11th, 2003 (10:44 AM) and continued uninterrupted 
until August 8th, 2005 (11:43PM). Linking server logs 
aggregate usage across all OpenURL-enabled informa- 
tion services, and thereby contain records of all user 
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requests, including abstract requests and full-text down- 
loads. They may additionally provide extensive usage, 
document and user metadata which allows e.g. requester 
type, request types and publication dates to be taken 
into account when considering usage-based indicators of 
scholarly impact. As linking servers become increasingly 
prevalent, they achieve a growing importance among the 
tools by which enabled library services can record us- 
age (Gallagher, Bauer, & Dollar, 2005; McDonald, 2006). 

Usage for nine major institutions, i.e. Chancellor, 
California Polytechnic State University, CSU Los Ange- 
les, CSU Northridge, CSU Sacramento, San Jose State 
University, CSU San Marcos, San Diego State University, 
and finally San Francisco State University was retained 
since they had recorded usage data most consistently and 
reliably, and represented the majority of CSU linking 
server data. A total of 3,679,325 unique usage events 
was thus recorded in the resulting master log for a total 
of 176,575 users (identified by their IP addresses 6 ), 
requesting services for 1,657,312 unique documents. A 
majority of the requests, i.e. 73%, pertained to journal 
articles. A range of service request types was recorded, 
including but not limited to full-text downloads, requests 
for holding information, requests for journal citation data 
and abstract requests. 

The resulting master log was then filtered to only in- 
clude events conforming to the following: 

1. Article full-text downloads. 

2. Year of download was 2004. 

3. Download concerned articles published in 2002 and 
2003. 

A total of 140,675 usage requests remained after this 
filtering. These events pertained to articles published in 
6,423 unique journals. The number of full-text article 
downloads was tallied for each of these journals. The re- 
sulting download frequency table was then merged with 
the 2004 ISI IF data resulting in a list of 3,146 journals 
for which download data as well as non-zero ISI IF were 
available. Following Eq. |2 the journal download fre- 
quency values were then divided by the number of citable 
articles as was used to calculate the 2004 ISI IF, result- 
ing in 2004 UIF values in conjunction with a 2004 ISI IF 
value for each journal. 

6 It is acknowledged that IP addresses do not uniquely identify in- 
dividual users. However the presented analysis relies on overall article 
download frequencies and does not require unique user identification. 



3 Results 

3.1 CSU UIF journal rankings 

Table Q] lists the 10 journals with highest 2004 CSU UIF 
as well as their 2004 ISI IF values. The list of 10 journals 
with highest 2004 CSU UIF values reveals a strong social 
science focus in the CSU community. The journals Top- 
ics in Early Childhood Special Education (TOP EARLY 
CHILD SPEC), Hispanic Journal of Behavior Sciences 
(HISPANIC J BEHAV SCI), Intervention in School and 
Clinic (INTERV SCH CLIN) and Monographs of the 
Society for Research in Child Development (MONOGR 
SOC RES CHILD) are found at the top of the list. The 
low 2004 ISI IF values of these journals indicates a strong 
discrepancy between the degree by which journals are 
used by the CSU community and their overall scholarly 
impact as indicated by the 2004 ISI IF. 

The 10 journals with highest 2004 ISI IF values are 
listed on the right-hand side of Table ^ along with their 
CSU UIF values. This ISI IF ranked list contains journals 
with high impact factor rankings such as Nature, Science, 
New England Journal of Medicine (NEW ENGL J MED), 
Cell and the Journal of the American Medical Associa- 
tion (JAMA). The corresponding 2004 CSU UIF values 
are relatively low for these journals in spite of their high 
2004 ISI IF rankings. 

3.2 Correlating CSU UIF and the ISI IF 

The Spearman rank order correlation coefficient between 
2004 CSU UIF and 2004 ISI IF values was found to be 
-0.207 (p-value < 0.001, N=3,164) indicating a modest 
negative correlation between usage and the ISI IF for the 
California State University community. This negative 
relationship is confirmed by the log-log scaled scatterplot 
in Fig. |3 Some of the journals on the extremities of 
the scatterplot are labeled. It is notable that the journals 
with a high ISI IF value (top of plot), regardless of their 
2004 CSU UIF values, mostly correspond to medicine. 
In addition, a significant number of prominent physics 
journals (Physical Review B and Physical Review 
Letters) are located in the quadrant of the plot which 
corresponds to high ISI IF and low CSU UIF values. 
In other words, they are considered high impact in the 
general scholarly community but their articles are used 
relatively infrequently in the CSU community. 

This comparison of 2004 CSU UIF and 2004 ISI IF 
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Ordered by 2004 CSU UIF Ordered by 2004 ISI IF 



Rank 


Title 


UIF04 


IF04 


Title 


UIF04 


IF04 


1 


TOP EARLY CHILD SPEC 


6.759 


0.862 


ANNU REV IMMUNOL 


0.059 


52.431 


2 


HISPANIC J BEHAV SCI 


6.720 


0.500 


CA-CANCER I CLIN 


0.667 


44.515 


3 


INTERV SCH CLIN 


6.017 


0.172 


NEW ENGL J MED 


0.262 


38.570 


4 


MONOGR SOC RES CHILD 


5.571 


7.286 


PHYSIOL REV 


0.164 


33.918 


5 


I SCHOOL PSYCHOL 


5.000 


1.750 


NATURE 


0.277 


32.182 


6 


J FAM VIOLENCE 


4.964 


0.491 


SCIENCE 


0.288 


31.853 


7 


SEX ROLES 


4.804 


0.639 


ANNU REV BIOCHEM 


0.077 


31.538 


8 


J YOUTH ADOLESCENCE 


4.723 


0.855 


CELL 


0.002 


28.389 


9 


EDUC URBAN SOC 


4.653 


0.224 


JAMA-J AM MED ASSOC 


1.196 


24.831 


10 


J AUTISM DEV DISORD 


4.513 


2.128 


ANNU REV NEUROSCI 


0.048 


23.143 



Table 1 : Journals ranked by 2004 CSU UIF and 2004 ISI IF values. 




Figure 3: CSU Usage Impact Factor and ISI Impact Fac- 
tor values for 3,146 journals. 



values fails to take into account variations among the dif- 
ferent disciplines in the CSU system. A set of discipline- 
specific comparisons of the correlation between the 2004 
CSU UIF and 2004 ISI IF is therefore provided in the fol- 
lowing sections. 

3.3 Discipline-specific comparisons 

The scatterplot in Fig. [3] suggests that the relationship 
between the 2004 CSU UIF and 2004 ISI IF values 
differ for particular disciplines, e.g. among the set of 
journals with high ISI IF values and low CSU UIF 
values we find a preponderance of physics journals. It 
is therefore warranted to assess the CSU UIF and ISI 
IF correlations within, rather than between, individual 
scholarly disciplines. 



The disciplines used by CSU to tally enrollment and 
faculty numbers in its Statistical Abstracts (Analytic 
Studies Division, 2004) are the starting point of the 
discipline-specific comparisons of 2004 CSU UIF and 
2004 ISI IF values in this paper. These disciplines are 
listed in Table [5] (reproduced from Analytic Studies 
Division (2004), page 125, table 81). 



Disciplines 

Agriculture and Natural Resources, Architecture and Environmental 
Design, Area Studies, Biological Sciences, Business and Manage- 
ment, Communications, Computer and Information Sciences, Ed- 
ucation, Engineering, Fine and Applied Arts, Foreign Languages, 
Health Professions, Home Economics, Interdisciplinary Studies, 
Letters, Library Science, Mathematics, Physical Sciences, Psychol- 
ogy, Public Affairs, Social Sciences 



Table 2: California State University disciplines used to 
tally enrollment and faculty numbers. 

To separate the group of examined journals in 
discipline-related sets, we manually matched each of 
the listed CSU disciplines with a set of ISI journal 
classification codes 7 . These classification codes were 
then used to demarcate discipline-related sets of journals 
within which a comparison of CSU UIF and ISI IF could 
be conducted. The ISI journal classification codes for 
the CSU disciplines listed in Table |2] are provided in 
Table|3(appendix). The 2004 CSU UIF and 2004 ISI IF 
correlations calculated for each of the thus demarcated 
CSU disciplines are listed in Table [3] Statistically 
significant correlations, marked in bold font, were found 
for only 3 of the 17 disciplines, namely Interdisciplinary 

7 This is a subjective matter. However, specific care was taken to 
match ISI lournal Classification Codes as literally as possible to the spe- 
cific CSU disciplines. 
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Studies (p = -0.470,7V = 89, p < 0.001), Education 
(p = 0.228,7V = 127, p = 0.010) and Engineering 
(p = -0.147, N = 259, p = 0.018). Physical Sciences 
was found to have a marginally significant, negative 
correlation (p = -0.225, iV = 56, p = 0.096). Log-log 
scaled scatterplots of the 2004 CSU UIF vs. 2004 ISI IF 
values for the mentioned four disciplines are shown in 
Fig. 0]and confirm the reported correlations. 



2004 CSU UIF vs. 2004 ISI IF 



Discipline 


rho 


N 


p- value 


Interdisciplinary Studies 


-0.470 


89 


>0.001 


Education 


+0.228 


127 


0.010 


Engineering 


-0.147 


259 


0.018 


Physical Sciences 


-0.225 


56 


0.096 


Agriculture and Natural Resources 


+0.238 


40 


0.138 


Business and Management 


+0.132 


115 


0.160 


Computer and Information Sciences 


+0.077 


155 


0.338 


Area Studies 


+0.169 


27 


0.397 


Public Affairs 


-0.073 


106 


0.455 


Library 


+0.126 


25 


0.546 


Psychology 


+0.033 


316 


0.556 


Architect, and Environ. Design 


+0.041 


188 


0.572 


Mathematics 


+0.077 


44 


0.617 


Biological Sciences 


-0.024 


331 


0.669 


Communications 


+0.049 


58 


0.712 


Social Sciences 


+0.026 


59 


0.843 


Health Professions 


-0.012 


126 


0.890 



Table 3: Discipline-specific 2004 CSU UIF and 2004 ISI 
IF Spearman rank-order correlations. 



It is of particular interest that three out of the four 
mentioned disciplines exhibit a negative correlation 
between 2004 CSU UIF and 2004 ISI IF values. Whereas 
a zero correlation would have indicated the absence of 
a relationship, in this case the two metrics are inversely 
correlated indicating that members of the communities 
interested in the particular discipline specifically do 
not frequently use articles published in high-impact 
journals and vice versa. However, for Education a 
significant positive correlation was found between the 
2004 CSU UIF and 2004 ISI IF, indicating that for this 
particular CSU discipline journal usage is moderately 
related to scholarly impact as indicated by the 2004 ISI IF. 

The size of a discipline in terms of the number of 
journals that it comprises may affect ISI IF values. A 
marginally significant correlation was found between the 
CSU UIF and ISI IF correlation vs. the number of jour- 
nals in that particular discipline (p = —0.459, N 
1 7. j) = 0.065). However, the correlation between CSU 



Education 0.23 *" Engineering -0.1 5 " 




UIF04 UIF04 



Figure 4: CSU UIF and ISI IF comparisons for 4 disci- 
plines with highest and lowest correlations. 



UIF and ISI IF values was not affected by the total num- 
ber of students enrolled in a particular discipline. No sta- 
tistically significant correlation was found between total 
student enrollment numbers and the correlation between 
CSU UIF and ISI IF correlations (p = -0.262, TV = 
17,0 = 0.308). 

3.4 Community demographics 

On the basis of the hypothesis that the observed corre- 
lations between CSU UIF and ISI IF values for these 
disciplines may be related to the academic demographics 
of the CSU communities corresponding to the inves- 
tigated disciplines, 2004 undergraduate and graduate 
enrollment and faculty numbers were matched to the 
observed correlations. Faculty numbers are estimated 
in terms of Full Time Equivalent Faculty (FTEF), i.e. 
the total number of hours taught in a particular division 
divided by the assumed 15 hours required for full-time 
faculty status. The particular number of FTEF and 
students respectively teaching or enrolled at the under- 
graduate or graduate level are listed in Table [4] Note that 
undergraduate FTEF numbers are split into low and high 
divisions which need to be summed to determine total 
undergraduate FTEFs. 
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Students 


FTEF 


Discipline 


U.Grad 


Grad. 


Low 


High 


Grad. 


Agri. & Nat. Res. 


5,381 


302 


62.7 


127.5 


21.0 


Arch. & Env. Des. 


2,902 


358 


33.9 


72.1 


19.2 


Area Studies 


319 


148 


12.9 


25.1 


4.3 


Biol. Sci. 


13,642 


1,052 


243.3 


264.7 


89.1 


Bus. & Mngmt. 


60,069 


5,242 


143.3 


914.4 


161.3 


Communications 


14,252 


674 


139.5 


299.5 


31.4 


Comp. & Inf. Sci. 


16,415 


2,322 


119.7 


223.8 


68.3 


Education 


16,084 


15,452 


49.6 


750.7 


836.6 


Engineering 


22,877 


4,146 


191.8 


483.6 


123.9 


Fine & Appl. Arts 


19,418 


1,321 


425.3 


712.1 


102.0 


Foreign Lang. 


2,252 


486 


226.2 


138.5 


21.2 


Health Prof. 


13,386 


3,984 


31.2 


142.9 


143.1 


Home Econ. 


3,261 


738 


29.4 


93.0 


16.4 


Interdisc. Stud. 


29,780 


948 


146.6 


225.5 


24.8 


Letters 


13,594 


3,413 


729.6 


691.6 


170.9 


Library 




561 


6.6 


2.0 


17.3 


Mathematics 


3,325 


816 


488.6 


189.8 


48.5 


Phys. Sci. 


3,310 


741 


425.6 


320.2 


75.3 


Psychology 


16,944 


1,380 


84.6 


332.9 


108.9 


Public Affairs 


14,250 


4,643 


47.4 


287.0 


216.8 


Social Sciences 


24,597 


2,956 


570.4 


1,081.9 


162.8 



Table 4: California State University student enrollment 
and Full Time Equivalent Faculty (FTEF) numbers (un- 
dergraduate and graduate) for 2004. 

Three ratios of the undergraduate versus the graduate 
community were defined as follows: 

1. All: the ratio of total graduate student enrollment 
plus graduate FTEF numbers over the total number 
of undergraduate student enrollment plus undergrad- 
uate (high and low division combined) FTEF num- 
bers. 

2. Student: the ratio of graduate over undergraduate 
student enrollment. 

3. Faculty: the ratio of graduate FTEF numbers over 
undergraduate FTEF numbers. 

The thus defined ratios were then compared to the 
observed CSU UIF vs. ISI IF correlations in Table [3] 
It must be stressed this comparison was restricted to 
the mentioned four disciplines for which significant or 
marginally significant CSU UIF vs. ISI IF correlations 
were observed. The results are listed in Table [5] and 
suggest the possibility of a relationship between the 
ratio of the graduate to undergraduate community within 
a discipline and the observed CSU UIF vs. ISI IF 
correlations. 

In particular, the discipline of Interdisciplinary Studies 
is characterized by a ±15 to 1 ratio of undergraduate to 
graduate students, and a ± 30 to 1 ratio of undergraduate 



to graduate faculty. A highly significant negative CSU 
UIF vs. ISI IF correlation was observed for this discipline. 

Conversely, Education is characterized by a ±1 to 1 
ratio of undergraduate students and faculty to graduate 
students and faculty. A significant positive correlation 
was observed between journal CSU UIF vs. ISI IF values 
within this discipline. 

This pattern is further confirmed by the undergraduate 
vs. graduate ratios for Engineering and Physical Sciences 
which has a moderate ±5 to 1 and ±10 to 1 undergradu- 
ate vs. graduate enrollment rate. Moderate negative CSU 
UIF vs. ISI IF correlation were observed. 

A linear regression model was generated for the 
relation between the ratio of graduate to undergraduate 
numbers versus the observed 2004 UIF and 2004 ISI 
IF correlation on the basis of the 4 data points listed in 
Table |3 Since similar results were obtained for all three 
demographic ratios ("All", "Student and "Faculty"), only 
the linear regression model for the combined student 
and faculty ratios ("All") is discussed. Fig. [5] shows a 
scatterplot of the mentioned values and the corresponding 
linear regression model. The linear regression model 
was found to have an intercept of -0.3873 and a slope of 
0.7183 (r 2 = 0.9029). 

From this it could be predicted that CSU UIF vs. ISI 
IF correlations become positive as soon as the graduate 
community becomes twice as large as the undergraduate 
community in a particular discipline. It can be noted that 
the overall ratio of graduate vs. undergraduate enrollment 
for the entire CSU system is 51,694 / 326,483 = 0.158, 
which together with the observed UIF vs. ISI IF correla- 
tion of p = -0.207 (p-value < 0.001, N=3,164) support 
the above mentioned pattern. 

3.5 Baseline assessment 

The 2004 ISI IF is used as a baseline assessment of 
scholarly impact against which 2004 CSU UIF values 
can be compared. Although CSU UIF and ISI IF are 
deliberately compared for the same years in which usage, 
citation and publication samples were recorded, questions 
arise with regards to the sensitivity of this comparison to 
longitudinal changes in the ISI IF over time. 

For this reason we investigated the degree of correla- 
tion between the 2004 CSU UIF vs. past ISI IF values, 
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Grad. vs. Undergrad. ratio 



Discipline 


p(UIF,IF) 


N 


p- value 


Student 


Faculty 


All 


Interdisciplinary Studies 


-0.470 


89 


0.000 


0.067 


0.032 


0.032 


Physical Sciences 


-0.225 


56 


0.096 


0.101 


0.224 


0.202 


Engineering 


-0.147 


259 


0.018 


0.183 


0.180 


0.180 


Education 


+0.228 


127 


0.010 


1.045 


0.881 


0.888 



Table 5: 2004 CSU UIF and ISI IF correlations compared to ratios of faculty and student numbers. 




Figure 5: Comparisons of Fall 2004 student and faculty populations vs. 2004 CSU UIF vs. ISI IF correlation. 



i.e. ISI IF values that were published in 1997 through 
2004 8 . The results are listed Table[6] These correlations 
indicate a stable, negative correlation between 2004 CSU 
UIF values and past ISI IF values over the mentioned pe- 
riod of 8 years. The absence of a particular trend in CSU 
UIF vs. ISI IF correlations is supported by the plot in Ta- 
ble El The scatterplots of CSU UIF vs. ISI IF values for 
each specific year are shown in Fig. [6] 

3.6 Results Summary 

The picture that emerges from these results can be sum- 
marized as follows: 

1. A moderate negative correlation between 2004 CSU 
UIF and 2004 ISI IF values was found without taking 
into account CSU disciplines. 

2. This negative correlation persists over a period of 8 
years counting back ISI IF values from the year in 
which usage was recorded (2004). 

8 At the time this analysis was conducted, 2005 ISI IF values were 
not yet available. 




Figure 6: CSU UIF vs. ISI IF comparisons for 1997-2004 
period. 



3. Some CSU disciplines exhibit negative correlations 
between CSU UIF and ISI IF values whereas others 
exhibit positive correlations. Most disciplines how- 
ever exhibit zero or insignificant correlations. 

4. CSU UIF vs. ISI IF correlations seemed to be related 
to the ratio between the sizes of the undergraduate 
and graduate community in a discipline. 



9 



1999 2000 2001 2002 2003 2004 

year 



IS I IF year 


ISI IF 1997 


ISI IF 1998 


ISI IF 1999 


ISI IF 2000 


ISI IF 2001 


ISI IF 2002 


ISI IF 2003 


ISI IF 2004 


2004 CSU UIF 


-0.186 


-0.159 


-0.170 


-0.171 


-0.197 


-0.203 


-0.204 


-0.207 


N 


2636 


2750 


2819 


2892 


2960 


3050 


3096 


3146 


p- value 


<0.001 


<0.001 


<0.001 


<0.001 


<0.001 


<0.001 


<0.001 


<0.001 



Table 6: Spearman rank-order correlation values between 2004 Usage Impact Factor and 1997-2004 ISI ISI Impact 
Factors. 



4 Conclusion 

Usage-based metrics of scholarly impact are gradually 
gaining acceptance in the domain of bibliome tries. How- 
ever, little attention has been paid to how usage-based 
impact assessments are influenced by the demographic 
and scholarly characteristics of particular communities. 
The discussed analysis of CSU usage data indicates 
significant, community-based deviations between local 
usage impact and global citation impact as indicated by 
the generated CSU UIF and ISI IF rankings respectively. 
In particular, we found a general negative correlation 
between the CSU IF and the ISI IF, which indicates usage 
over the entire CSU community is inversely related to 
general citation impact. 

The observed negative correlations between the CSU 
UIF and ISI IF run counter to previous findings. In fact, 
Brody et al. (2006) and Bollen et al. (2005) report positive 
correlations between usage and citation rates. However, 
the services that recorded this usage, namely the UK 
arXiv mirror and the LANL Research Library systems, 
mostly accommodate a community of scholars in com- 
puter science and physics. The CSU community for wich 



usage was recorded is composed of a mix of students, 
faculty, staff and others, focused on a variety of science 
and social science domains. It can be speculated that 
both the nature of the CSU library collection as well as 
the CSU community that uses it jointly contributed to the 
negative correlations between CSU UIF and ISI IF values. 

However, positive as well as negative CSU UIF vs. 
ISI IF correlations were observed for specific scholarly 
disciplines. In addition, a comparison of the relative sizes 
of the undergraduate and graduate communities at CSU 
to the correlations of CSU UIF vs. ISI IF values within 
specific disciplines, suggested that the size of the graduate 
community (students and faculty) relative to that of the 
undergraduate community within a discipline could be 
related to the magnitude of the observed CSU UIF vs. ISI 
IF correlations. The tentative linear relationship that was 
observed between the ratio of graduate to undergraduate 
enrollment and CSU UIF vs. ISI IF correlations raises 
the possibility that applications of usage data can take 
into account demographic data to extract different facets 
of impact. We must however caution that the latter 
observations are based on only those 4 disciplines for 
which significant or marginally significant CSU UIF vs. 
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ISI IF correlations were observed. Future research could 
focus on validating these tentative results for a larger 
number of disciplines. 

In Section 12.3.11 we distinguished two factors that 
shape metric -based assessments of scholarly impact, 
namely the formal definition of a metric and the sample 
that it has been applied to. Although the UIF has been 
defined to mimic the IF, the CSU UIF and ISI IF rankings 
in this manuscript have been generated for very different 
samples of the scholarly community. The ISI IF rests on 
citation data collected for a set of ISI-selected journals; 
its rankings therefore express the global community of all 
scholarly authors publishing in those journals. The CSU 
usage data on the other hand reflects the characteristics of 
the local CSU academic community that comprises a mix 
of students and faculty among others. It can therefore 
be considered at the same time more diverse than the 
ISI-defined sample in terms of its composition, yet more 
limited in terms of its span since it applies to CSU users 
only. 

We envision three future paths along which usage- 
based metrics such as the UIF can be developed. These 
paths are not mutually exclusive and are related to the 
issues mentioned in the introduction. 

The first path is one in which attempts are undertaken 
to mimic the properties of the ISI IF on the basis of usage 
data. This requires the aggregation of a meaningful, rep- 
resentative sample of the scholarly community, similar 
in span to the ISI IF sample, and efforts to compensate 
for the increased diversity of the usage data sample, 
e.g. excluding all agents that are not scholarly authors 
and taking into account particular discipline-specific 
demographics and preferences. This article has provided 
an initial exploration of the second issue, whereas the 
architecture described in (Bollen & Van de Sompel, 
2006b) may offer at least a technical solution to the 
first issue. Questions remain as to how one can create a 
truly representative usage sample of the global scholarly 
community. 

The second path along which usage-based metrics of 
scholarly status can be developed is focused on leveraging 
the greater diversity (in terms of agents and community 
characteristics) that usage data generally engenders. This 
path may still require the aggregation of a meaningful, 
representative sample of the scholarly community, but 
its assessment of scholarly impact specifically leverages 



sample diversity to assess the many different facets of 
impact as they exist in the scholarly community. Indeed, 
one could argue that an article that is often read by a 
majority of students, yet is seldom cited by scholars in 
this field, nevertheless has considerable scholarly impact. 
In fact, on the basis of sufficiently detailed usage data, 
impact could be separately assessed for any subset of 
the scholarly community including undergraduate and 
graduate students, research faculty, lecturers and the 
public at large. 

Finally, where only local usage data is collected, there 
is still particular value in being able to determine local 
impact rankings which correspond to the preferences and 
characteristics of specific communities such as CSU. The 
CSU UIF generated in this article may not be globally 
applicable, but offers CSU administrators an interesting 
perspective on what is valued in their community. Our 
analysis demonstrates that considerable, yet locally 
meaningful deviations can occur between impact as it is 
perceived by particular scholarly disciplines and the ISI 
IF. Such deviations are not problematic, but offer consid- 
erable possibilities to optimize local information services 
and adopt policies to accommodate the preferences of 
local communities. 

Many issues remain to be addressed in future research 
on this topic. The Andrew W. Mellon foundation has 
awarded a grant to our team to investigate a range of issues 
related to the definition of usage-based metrics of schol- 
arly impact. The funded project, named MESUR 9 , aims to 
construct a large-scale model of the scholarly community 
which merges usage and bibliographic data to support the 
definition and validation of a range of usage-based metrics 
of scholarly status. This paper describes our first explo- 
rations in this research area. 
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Appendix 



Agriculture and Natural Resources: AD (AGRICULTURE, DAIRY & ANIMAL SCIENCE), AE (AGRICULTURAL ENGINEERING, AF 
(AGRICULTURAL ECONOMICS & POLICY), AH (AGRICULTURE, MULTIDISCIPLINARY). XE (AGRICULTURE, SOIL SCIENCE) 
Architecture and Environmental Design: IH (ENGINEERING, ENVIRONMENTAL), JA (ENVIRONMENTAL SCIENCES, NE (PUBLIC, 

ENVIRONMENTAL & OCCUPATIONAL HEALTH), JB (ENVIRONMENTAL STUDIES) 

Area Studies: BM (AREA STUDIES) 

Biological Sciences: CQ (BIOCHEMISTRY & MOLECULAR BIOLOGY, CU (BIOLOGY), DB (BIOTECHNOLOGY & APPLIED MI- 
CROBIOLOGY), DR (CELL BIOLOGY), HT (EVOLUTIONARY BIOLOGY), HY (DEVELOPMENTAL BIOLOGY), PI (MARINE & 

FRESHWATER BIOLOGY), QU (MICROBIOLOGY), WF (REPRODUCTIVE BIOLOGY), BV (PSYCHOLOGY, BIOLOGICAL) 

Business and Management: DI (BUSINESS), DK (BUSINESS, FINANCE), PE (OPERATIONS RESEARCH & MANAGEMENT SCI- 
ENCE), PC (MANAGEMENT) 

Communications: YE (TELECOMMUNICATIONS, EU (COMMUNICATION) 

Computer and Information Sciences: EP (COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE), ER (COMPUTER SCIENCE, CY- 
BERNETICS), ES (COMPUTER SCIENCE, HARDWARE & ARCHITECTURE, ET (COMPUTER SCIENCE, INFORMATION SYS- 
TEMS), EV (COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS), EW (COMPUTER SCIENCE, SOFTWARE ENGINEER- 
ING), EX (COMPUTER SCIENCE, THEORY & METHODS), ET (COMPUTER SCIENCE, INFORMATION SYSTEMS), PT (MEDICAL 

INFORMATICS), NU (INFORMATION SCIENCE & LIBRARY SCIENCE) 

Education: HB (EDUCATION, SCIENTIFIC DISCIPLINES), HA (EDUCATION & EDUCATIONAL RESEARCH), HE (EDUCATION, 

SPECIAL). HI (PSYCHOLOGY. EDUCATIONAL) 

Engineering: AE (AGRICULTURAL ENGINEERING), AI (ENGINEERING, AEROSPACE), EW (COMPUTER SCIENCE, SOFTWARE 
ENGINEERING), IF (ENGINEERING, MULTIDISCIPLINARY), IG (ENGINEERING, BIOMEDICAL), IH (ENGINEERING, ENVIRON- 
MENTAL), II (ENGINEERING, CHEMICAL), IJ (ENGINEERING, INDUSTRIAL), IK (ENGINEERING, MANUFACTURING, IL (EN- 
GINEERING, MARINE), IM (ENGINEERING, CIVIL), IO (ENGINEERING, OCEAN), IP (ENGINEERING, PETROLEUM), IQ (ENGI- 
NEERING, ELECTRICAL & ELECTRONIC), IU (ENGINEERING, MECHANICAL), IX (ENGINEERING, GEOLOGICAL), PZ (MET- 

ALLURGY & METALLURGICAL ENGINEERING) 

Fine and Applied Arts: No results 
Foreign Languages: No results 

Health Professions: HL (HEALTH CARE SCIENCES & SERVICES), NE (PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH), 

LQ (HEALTH POLICY AND SERVICES) 

Home Economics: No results 

Interdisciplinary Studies: EV (COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS, PO (MATHEMATICS, INTERDISCI- 
PLINARY APPLICATIONS), WU (SOCIAL SCIENCES, INTERDISCIPLINARY) 

Letters: No results 

Library: NU (INFORMATION SCIENCE & LIBRARY SCIENCE) 

Mathematics: PN (MATHEMATICS, APPLIED), PO (MATHEMATICS, INTERDISCIPLINARY APPLICATIONS), PQ (MATHEMATICS) 
Physical Sciences: UB (PHYSICS, APPLIED), UF (PHYSICS, FLUIDS & PLASMAS), UH (PHYSICS, ATOMIC, MOLECULAR & 

CHEMICAL), UI (PHYSICS. MULTIDISCIPLINARY). UK (PHYSICS, CONDENSED MATTER) 

Psychology: VI (PSYCHOLOGY), BV (PSYCHOLOGY, BIOLOGICAL), EQ (PSYCHOLOGY, CLINICAL), HI (PSYCHOLOGY, ED- 
UCATIONAL), , MY (PSYCHOLOGY, DEVELOPMENTAL), NQ (PSYCHOLOGY, APPLIED), VJ (PSYCHOLOGY, MULTIDISCI- 
PLINARY), VP (PSYCHOLOGY, PSYCHOANALYSIS), VS (PSYCHOLOGY, MATHEMATICAL), VX (PSYCHOLOGY, EXPERIMEN- 

TAL), WQ (PSYCHOLOGY, SOCIAL) 

Public Affairs: NE (PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH). VM (PUBLIC ADMINISTRATION)! 

Social Sciences: PS (SOCIAL SCIENCES. MATHEMATICAL METHODS), WU (SOCIAL SCIENCES, INTERDISCIPLINARY), WV 

(SOCIAL SCIENCES, BIOMEDICAL) 



Table 7: ISI journal classification codes for CSU disciplines listed in Table|2] 
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